Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Identifieur interne : 003B79 ( Main/Exploration ); précédent : 003B78; suivant : 003B80

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Auteurs : JUN CAI [France, République populaire de Chine] ; Ghazi Bouselmi [France] ; Yves Laprie [France] ; Jean-Paul Haton [France]

Source :

RBID : Francis:09-0158745

Descripteurs français

English descriptors

Abstract

LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJO corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition</title>
<author>
<name sortKey="Jun Cai" sort="Jun Cai" uniqKey="Jun Cai" last="Jun Cai">JUN CAI</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Cognitive Science, Xiamen University</s1>
<s2>361005 Xiamen</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>361005 Xiamen</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bouselmi, Ghazi" sort="Bouselmi, Ghazi" uniqKey="Bouselmi G" first="Ghazi" last="Bouselmi">Ghazi Bouselmi</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Laprie, Yves" sort="Laprie, Yves" uniqKey="Laprie Y" first="Yves" last="Laprie">Yves Laprie</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Haton, Jean Paul" sort="Haton, Jean Paul" uniqKey="Haton J" first="Jean-Paul" last="Haton">Jean-Paul Haton</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0158745</idno>
<date when="2009">2009</date>
<idno type="stanalyst">FRANCIS 09-0158745 INIST</idno>
<idno type="RBID">Francis:09-0158745</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000293</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000747</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000243</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000243</idno>
<idno type="wicri:doubleKey">0885-2308:2009:Jun Cai:efficient:likelihood:evaluation</idno>
<idno type="wicri:Area/Main/Merge">003C77</idno>
<idno type="wicri:Area/Main/Curation">003B79</idno>
<idno type="wicri:Area/Main/Exploration">003B79</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition</title>
<author>
<name sortKey="Jun Cai" sort="Jun Cai" uniqKey="Jun Cai" last="Jun Cai">JUN CAI</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Cognitive Science, Xiamen University</s1>
<s2>361005 Xiamen</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>361005 Xiamen</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bouselmi, Ghazi" sort="Bouselmi, Ghazi" uniqKey="Bouselmi G" first="Ghazi" last="Bouselmi">Ghazi Bouselmi</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Laprie, Yves" sort="Laprie, Yves" uniqKey="Laprie Y" first="Yves" last="Laprie">Yves Laprie</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Haton, Jean Paul" sort="Haton, Jean Paul" uniqKey="Haton J" first="Jean-Paul" last="Haton">Jean-Paul Haton</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Gwupe Parole. LORIA-CNRS and INRIA, BP 239</s1>
<s2>54600 Vandoeuvre-les-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Computer speech & language : (Print)</title>
<title level="j" type="abbreviated">Comput. speech lang. : (Print)</title>
<idno type="ISSN">0885-2308</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Computer speech & language : (Print)</title>
<title level="j" type="abbreviated">Comput. speech lang. : (Print)</title>
<idno type="ISSN">0885-2308</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Assessment</term>
<term>Speech recognition</term>
<term>Statistical Model</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Evaluation</term>
<term>Reconnaissance de la parole</term>
<term>Modèle statistique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJO corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>République populaire de Chine</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Nancy</li>
<li>Vandœuvre-lès-Nancy</li>
</settlement>
<orgName>
<li>Centre national de la recherche scientifique</li>
<li>Institut national de recherche en informatique et en automatique</li>
<li>Laboratoire lorrain de recherche en informatique et ses applications</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Grand Est">
<name sortKey="Jun Cai" sort="Jun Cai" uniqKey="Jun Cai" last="Jun Cai">JUN CAI</name>
</region>
<name sortKey="Bouselmi, Ghazi" sort="Bouselmi, Ghazi" uniqKey="Bouselmi G" first="Ghazi" last="Bouselmi">Ghazi Bouselmi</name>
<name sortKey="Haton, Jean Paul" sort="Haton, Jean Paul" uniqKey="Haton J" first="Jean-Paul" last="Haton">Jean-Paul Haton</name>
<name sortKey="Laprie, Yves" sort="Laprie, Yves" uniqKey="Laprie Y" first="Yves" last="Laprie">Yves Laprie</name>
</country>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Jun Cai" sort="Jun Cai" uniqKey="Jun Cai" last="Jun Cai">JUN CAI</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003B79 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 003B79 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Francis:09-0158745
   |texte=   Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022